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We analyze gene co-expression network under the random matrix theory framework. The nearest 
neighbor spacing distribution of the adjacency matrix of this network follows Gaussian orthogonal 
statistics of random matrix theory (RMT). Spectral rigidity test follows random matrix prediction 
for a certain range, and deviates after wards. Eigenvector analysis of the network using inverse 
participation ratio (IPR) suggests that the statistics of bulk of the eigenvalues of network is consistent 
with those of the real symmetric random matrix, whereas few eigenvalues are localized. Based on 
these IPR calculations, we can divide eigenvalues in three sets; (A) The non-degenerate part that 
follows RMT. (B) The non-degenerate part, at both ends and at intermediate eigenvalues, which 
deviate from RMT and expected to contain information about important nodes in the network. 
(C) The degenerate part with zero eigenvalue, which fluctuates around RMT predicted value. We 
identify nodes corresponding to the dominant modes of the corresponding eigenvectors and analyze 
their structural properties. 

PACS numbers: 89.75.Hc,64.60.Cn,89.20.-a 



I. INTRODUCTION 
A. Complex Networks 

Gene expression information captured in microarrays 
data for a variety of environmental and genetic perturba- 
tions, in conjunction with other sources such as protein- 
protein/protein-DNA interaction and operon organiza- 
tion data, promises to yield unprecedented insights into 
the organization and functioning of biological systems 
P, [2| • It has been increasingly realized that dissecting 
the genetic and chemical circuitry prevents us from fur- 
ther understanding the biological processes as a whole. In 
order to understand the complexities involved, all reac- 
tions and processes should be analyzed together. To this 
end, network theory will be used. It has been getting 
fast recognition to study systems which could be defined 
in terms of units and interactions among them. These 
studies revealed that the available data from gene co- 
expression network share some unexpected features with 
other complex networks as diverse as the Internet routers. 
In order to understand the behavior of complex systems 
such as gene co-expression network, several simple mod- 
els, based on the simple principles and captures some 
essential features of the system, have been introduced, 
these models are0-ll]. 

In this paper, by using network theory and random 
matrix theory (RMT), we analyze gene co-expression 
network. First we generate network from the gene co- 
expression data collected form six brain regions that are 
metabolically relevant to Alzheimer's disease by us- 
ing appropriate threshold, and then study the spectra of 
this network under the RMT framework. Information 
about the genes that are preferentially expressed dur- 
ing the course of Alzheimer's disease could improve our 



understanding of the molecular mechanisms involved in 
the pathogenesis of this common cause of cognitive im- 
pairment in senior persons, provide new opportunities in 
the diagnosis, early detection, and tracking of this disor- 
der, and provide novel targets for the discovery of inter- 
ventions to treat and prevent this disorder. Information 
about the genes that are preferentially expressed in rela- 
tionship to normal neurological aging could provide new 
information about the molecular mechanisms that are in- 
volved in normal age-related cognitive decline and a host 
of age-related neurological disorders, and they could pro- 
vide novel targets for the discovery of interventions to 
mitigate some of these deleterious effects. 

Co-expression networks have also been known as rel- 
evance networks. The terminology has been introduced 
by Butte and Kohane 0- Since then properties of the 
relevance networks have heen extensively studied Q . 

The paper is organized as follows: after introductory 
sub-section on the relevance of network theory and gene 
co-expression network, we discuss the recent outcome of 
RMT analysis of complex networks in the following sub- 
section B. Main goals of our eigenvector analysis are writ- 
ten in the subsection C. Section II describes the impor- 
tant achievements of RMT and explains its various prop- 
erties we use in our analysis. Section III sheds light on 
the data and network construction. Section IV presents 
various numerical results. Section V concludes the paper 
with a discussion on the relevance of current analysis, as 
well suggests future directions. 



B. RMT of Network Spectra 

Our previous work Q showed that various vastly stud- 
ied model networks follow random matrix predictions of 
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Gaussian orthogonal statistics (GOE) at the level repul- 
sion domain. We demonstrated that nearest neighbor 
spacing distribution (NNSD) of protein-protein interac- 
tion network of budding yeast follows RMT prediction 
as well 0- This is a promising result which suggests 
that these networks can be modeled random ma- 
trix chosen from an appropriate ensemble. The universal 
GOE statistics of eigenvalues fluctuations could be un- 
derstood as some kind of randomness spreading over the 
protein-protein interaction network and model networks 
capturing real world properties. Recently, covariance ma- 
trix of amino acid displacement has been analyzed under 
RMT framework The analysis shows that the bulk 
of eigenvalues follows universal GOE statistics of RMT. 
In thepresent paper, we analyze gene co-expression net- 
work [g under RMT framework. First we calculate near- 
est neighbor spacing distribution of network spectra, and 
then perform eigenvector analysis to detect nodes having 
specific contribution to network. 



C. Important nodes and connections 

It is now well known that various real world systems 
are scale- free network^. The scale- free nature of net- 
works suggests that there exist few nodes with very high 
degrees. Motivated by this finding they suggested that 
since these nodes are responsible to hold the whole net- 
works and henceforth are the most important ones. Some 
other analysis (by Newman and others) of real- world net- 
works show that complex networks have community or 
module structure Modules are the division of 

network nodes within which the network connections are 
dense, but between which they are sparser. The modu- 
larity concept assumes that system functionality can be 
partitioned into a collection of modules and each module 
performs an identifiable task, separable from the func- 
tions of other modules [l^. Analysis of module struc- 
ture involves betweenness measure. Betweenness of an 
edge is defined as the number of shortest path between 
pairs of nodes going through the edge. Betweenness stud- 
ies of real world networks suggests that the nodes con- 
necting the different communities are the most important 
ones, which has been verified in the metabolic networks 
by Amaral et. al. jT3|. 

Above description emphasizes on the importance of 
nodes depending on their position in the network, as 
these nodes characterize network properties. On the 
other hand Erdos-Renyi (ER) and Strogatz- Watts (SW) 
models emphasize on the importance of random connec- 
tions in the networks. In the ER model any two nodes are 
connected with probability p. One of the most interesting 
property of ER model is the sudden emergence of vari- 
ous global properties, such as, emergence of a giant clus- 
ter. As p increases, while number of nodes in the graph 
remains constant, the giant cluster emerges through a 
phase transition Further, the SW model shows the 
small world transition with the fine tuning of number of 



random connections [15|. Our previous RMT analysis of 
the spectra of SW model networks show that at the 
SW transition there is a transition to the spreading of 
randomness in the network characterized by the correla- 
tions between nearest eigenvalues. In the current paper 
we analyze spectra of the gene co-expression network un- 
der RMT framework. Particularly we study eigenvectors 
of the adjacency matrix of this network. The spectra 
has two parts, one part which follows RMT predictions 
of universal GOE statistics and other part which does 
not follow RMT prediction. The eigenvectors deviating 
from the RMT prediction provide information about the 
influential or important nodes in the network. 

II. RANDOM MATRIX STATISTICS 

RMT deals with the statistical properties of matrices 
with independent random entries. To be self-consistent, 
we give a brief introduction of the RMT here, and ex- 
plain various RMT properties of eigenvector components 
which we will use in our analysis. RMT was initially 
proposed to explain the statistical properties of nuclear 
spectra [l^. Later this theory was successful applied in 
the study of the spectra of different complex systems such 
as disordered systems, quantum chaotic systems, large 
complex atoms [l^- Recent studies illustrate the useful- 
ness of RMT in understanding the statistical properties 
of the empirical cross-correlation matrices appearing in 
the study of multivariate time series of followings: the 
price fluctuations in the stock market [l^, EEG data 
of brain [l9| . variation of various atmospheric parame- 
ters [2^, etc. Recent analysis of complex networks under 
RMT framework [1, [H [M [H show that various network 
models and real world network also follow universal GOE 
statistics. Furthermore localization of eigenvectors have 
also been used to analyze various structural and dynam- 
ical properties of real and model networks [2^ [IJ] . 

In the following, we introduce spacing distribution 
and A3 statistics of random matrices. We denote the 
eigenvalues of a network by A^, i = 1, . . . , N, where N 
is size of the network and Ai < A2 < A3 < • • • < Xn. 
In order to get universal properties of the fluctuations 
of eigenvalues, people usually unfold the eigenvalues 
by a transformation A^ = N(Xi\ where N is averaged 
integrated eigenvalue density [l6|. Since we do not have 
any analytical form for N, we numerically unfold the 
spectrum by polynomial curve fitting (for elaborate 
discussion on unfolding, see Ref.fl^). After unfolding, 
average spacings is unity, independent of the system. 
Using the unfolded spectra, we calculate spacings as 
Si = Ai+i — Ai. NNSD is defined as the probability distri- 
bution (P(s)) of these s^'s. In the case of GOE statistics, 

P{s)^-sexpi^- — j (1) 
The A3-statistic measures the least-square deviation of 
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the spectral staircase function representing the averaged 
integrated eigenvalue density N{X) from the best straight 
line fitting for a finite interval L of the spectrum, i.e., 

1 /-aj+L _ _ _ 

A3(i;a;) = -min / [N{X)-aX-b] dX (2) 

where a and b are obtained from a least-square fit. Av- 
erage over several choices of x gives the spectral rigidity 
A3(L). For the GOE case, A3(L) depends logarithmically 
on L, i.e.. 



A3(L)^i-lnL 



(3) 



The following sub-section explains the properties of 
eigenvectors of random matrices. 





FIG. 1: Adjacency matrix of the largest connected compo- 
nent of the Gene co-expression network with the threshold 
value of ~ 0.89. Nodes forming largest connecting cluster are 
renumbered in the sequential order for a clear visualization. 



A. Eigenvector analysis 

The distribution of eigenvectors components are stud- 
ied to obtain system dependent information. Let uf is 
the Ith component of fcth eigenvector u''. The eigenvec- 
tor components of a GOE random matrix are Gaussian 
distributed random variables, for this the distribution of 
r = |uf P, in the limit of large matrix dimension, is given 
by Porter- Thomas distribution [2^, i.e.. 



P(r) 



N 



: exp 



-Nr 



(4) 



Shannon entropy for the state whose components are de- 
scribed by the above distribution, would be given by in 
large N limit as |25|, 



N / rlii{r)P{r)dr 



(5) 



Additionally, inverse participation ratio (IPR) is also 
considered to study the RMT features of the eigenvec- 
tors. The IPR of eigenvector is defined as 



N 

1=1 



(6) 



where uf , I 



1, . . . , N are the components of eigenvector 
The meaning of l'' is illustrated by two limiting cases 



: (i) a vector with identical components uf = 1/VN 
has ~ whereas (ii) a vector with one compo- 



nent 



1 and the remainders zero has / 



1. Thus, 



the IPR quantifies the reciprocal of the number of eigen- 
vector components that contribute significantly. For a 
vector with components following distribution Q has 
/'^ ~ 3/iV. 



from clinically and histopathologically normal aged hu- 
man brains. From these data-sets only 74 normal sam- 
ples were used to construct the co-expression networks. 
In the original study the Affymetrix Human Genome 
UI33 Plus 2.0 Array was used. This micro-array contains 
54675 oligonucteotids (probesets) representing the ex- 
pressed human genes for each samples. On the microar- 
ray one gene is represented by one or more probesets. 
Each probeset is built up from 25 mer length oligonu- 
cleotides, so called probes [H]. In the present study 
probesets are the units of observation. For the identi- 
fication of probesets the Affymetrix IDs were used. The 
Pearson's product-moment correlation was calculated for 
each probeset-pair expression level, and those which have 
value greater than 0.88 are used to construct the gene co- 
expression network. This network consists of 5000 nodes 
and 1201480 undirected edges. Nodes represent probe- 
set denoting genes, and edges denote their co-expression 
levels. 

From this weighted network, we construct a sparse bi- 
nary network as following. We choose the value of thresh- 
old being r = 0.89, if the co-expression strength is greater 
than r than the corresponding element in the matrix gets 
value 1, otherwise 0. Threshold value of r = 0.89 leads 
to a network with much less number of edges, and results 
into many disconnected component. Note that choosing 
the threshold value is a crucial step and different schemes 
have been proposed to select it [13, H^. We sort out 
the nodes and edges forming largest connecting cluster, 
which is of the size TV = 3179 and 46033 connections. 
The average degree of this network is < /c >^ 30. RMT 
analysis is done for this biggest component. Fig. [T] shows 
the adjacency matrix of this component and Fig. [2] is the 
degree distribution. 



III. DATA AND NETWORK CONSTRUCTION 



IV. RESULTS 



The data-set (GSE5281) was obtained from Gene Ex- 
pression Omnibus @. Liang et al. studied gene ex- 
pression profiles from laser capture micro dissected neu- 
rons in six functionally and anatomically distinct regions 



In the following, we present the various RMT results 
for gene co-expression network constructed above. We 
calculate the eigenvalues and eigenvectors of the adja- 
cency matrix corresponding to the largest connected net- 
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FIG. 2: Degree distribution of the largest connected part of 
the Gene co-expression network for threshold 0.89. 




FIG. 3: (Color online) Spacing distribution (a) and ^si^L) 
statistics (b) for the eigenvalue spectra of the gene co- 
expression network. The histogram in (a) corresponds to the 
numerical values and solid line is GOE prediction ((T} of RMT. 
The circles in (b) are numerical results ((2]) and the solid curve 
is GOE prediction Q of A3. 



work. Since this is an undirected network, eigenvalues of 
adjacency matrix are real, and we denote them as — 
1 . . .N. Eigenvectors are denoted as m'^, fc = 1 . . . A^. 



A. Spacing distribution and A3 analysis 

From this spectrum we calculate NNSD P{s) as de- 
scribed in the sectionlllland A3(L) statistic using Eq. ©. 
Fig. shows that NNSD agrees well with the NNSD 
of GOE matrices ([T]) with the value of Brody parameter 

Fig. [31[b) plots the ^^{L) statistics. It can be seen 
that A3(L) statistic agrees well with the GOE statistics 
up to the value of L ~ 25, (which is much less than 
the same for the corresponding random and scale free 
model networks 0). According to the RMT, this implies 
that besides randomness, the network has some specific 



features. Note that the points which deviate from GOE 
statistics {L > 20), as shown in the Fig. ^h) can also 
be analyzed using deformed GOE statistics as shown in 
211. 



B. Eigenvector analysis 

Having calculated spacing distribution and A3 statis- 
tics, now we use eigenvector analysis to study the factors 
responsible for the deviation from RMT. We calculate 
IPR and entropy for all the eigenvectors. The eigen- 
vectors, whose IPR and entropy deviate from the ran- 
dom matrix predictions, carry the relevant information. 
The nodes corresponding to the top contributing compo- 
nents of these vectors may be important nodes in terms 
of functionality of the whole network. In the following 
we present the Eigenvectors analysis results for the gene 
co-expression network. 

Fig. Hlja) shows eigenvalues in the increasing order. 
Apart from distinguishably seen high eigenvalues towards 
the end of the spectra, there is a flat part around the zero 
eigenvalue. Real world networks, in general, are very 
sparse and are reported to have large number of zero 
eigenvalues (sol . Isij . Though for the network we con- 
sider here, out of 3179 eigenvalues, only approximately 73 
(^ 2.5% of all eigenvalues) are degenerate with the value 
zero. The degeneracy at zero eigenvalue is lesser than 
many other real world networks There are nearly 
3106 non-degenerate eigenvalues, which could be taken 
as the effective dimensionality of the network. 

We also calculate Shannon entropy for all the eigen- 
vectors using Eq. ([S]), and compare them with those of 
the random vectors. Fig. IHJb) shows the entropy as a 
function of eigen numbers. According to RMT, Shan- 
non entropy of a random vector of dimension N = 3106 
is ln(3106/2) ~ 7.35. Furthermore, RMT predicted 
value for Shannon entropy of a random vector of di- 
mension N = 73 (corresponding to degenerate part) is 
ln(73/2) ~ 3.6. Based on these calculations, we can di- 
vide eigenvalues into three sets; (A) The non-degenerate 
part that follows RMT. (B) The non-degenerate part, 
at both ends and at intermediate eigenvalues, which de- 
viate from RMT and expected to contain information 
about important nodes in the network. (C) The degener- 
ate part with zero eigenvalue, 1636 to 1708 which fluc- 
tuates around RMT predicted value. 

Furthermore, we calculate IPR of all the eigenvectors 
using Eq. and plot in Fig. S] (c). It shows that IPR 
of several eigenvalues are localized. For example, vec- 
tors corresponding to the 1140 to 1148 eigenvalues have 
l'' > 0.1, showing that few components contribute more 
than the other components. Following we enlist some 
localized eigenvectors corresponding to non-degenerate 
eigenvalues from set (B): m^^*^^ (with ~ 0.5), u^^^^ 
(with ~ 0.31), ^2257 (^j^ij jk ^ 0.25). Some of the 
localized eigenvectors corresponding to zero eigenvalues 
are (set (C)); u^^^e (^j^-j^ jk ^ q ^1670 ^1671 
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FIG. 4: (Color online) Eigenvalues (a), entropy (b), and IPR 
(c) as a function of eigen number for the threshold value of 
0.89. Open blue circles in (c) correspond to the localized 
eigenvectors whose top contributing nodes are listed in the 
Table U 



202060.at 
217731s_at 
201121s_at 
221775x.at 
229630s_at 



Set B 

1148 



U 

227636.at 
205003.at 

211940x_at 
224616.at 

222203s_at 



202916s.at 
226832.at 
209860s.at 
218175.at 
221810.at 



Set C 
tail) 



u 

225921.at 
212635.at 
208645s_at 
221511x.at 
231896s_at 



21435x_at 
203034s.at 
200673_at 
221471_at 
225950_at 



TABLE I: Top five largest contributing nodes in localized 
eigenvectors for network constructed with the threshold value 
of 0.89. The nodes are written in the original gene number as 
given in the datasets 



(with J*"' ~ 0.5). We next analyze the significant con- 
tributors of eigenvectors deviating from the RMT pre- 
dictions. The eigenvector u^^^^ contains approximately 
= 20 significant participants. TableUpresents 
top 5 significant contributors (nodes) corresponding to 
the localized eigenvector mentioned above. Note that 
original gene number are written as in the datasets Q. 
As shown in the Fig. ^ degree distribution of the con- 
nected network analyzed above follows a power law with 
a fat tail, which means that few nodes are hubs, and 
carry the whole network. But random matrix analysis of 
eigenvetcors reveals that all the most contributing nodes 
listed above have rather small degree. They are all al- 
most towards bottom of the power law distribution. 

The degree of all the top contributing nodes in the lo- 
calized eigenvectors are either well below the average de- 
gree or around the average degree of the network. Gene, 
assigned with probeset 202060_at, (corresponding to the 
node 2299 in the renumbered network) which is the first 
top contributing node corresponding to eigenvector u^^'^^, 
has a degree 15, the second top contributing node has a 



Set B 



210338s_at 
210418s.at 
202178.at 
38398_at 
213347x_at 



208666s_at 
224819.at 
209460.at 
226395_at 
201525_at 



201121s_at 
208667s.at 
223716.s.at 
224644_at 
200626s_at 



Set C 



211733x_at 
230869.at 
228045.at 

211733x_at 
242317_at 



,1270 



201494_at 230416_at 
223209s.at 228283.at 
225284.at 238494.at 
201494_at 230416_at 
212788x_at 212474_at 



TABLE IL Top contributing nodes (genes) in the localized 
eigenvectors for the threshold value 0.91 



degree 17, the third node has a degree 20. Fourth and 
fifth top contributing nodes have degree 9 each. The top 
five nodes corresponding to m^^*** have degree 21, 14, 7, 
17 and 24. Those are corresponding to eigenvector u^^^'' 
have degree 1, 1, 6, 3 and 1 respectively. The localized 
eigenvectors corresponding to set (c) are w-'^^™, u^^*"^, and 
top five contributing nodes have degree, in sequential or- 
der from first to the fifth contributing node (see Table 
2, 4, 8, 1, 3 and 10, 9, 23, 14, 2 respectively. 

Now we change the threshold value to 0.91, this thresh- 
old value leads to 25, 000 connections in the whole net- 
work. This network has largest connected cluster of size 
2,439 and number of connections 22546. The average 
degree of this network is < k >~ 20. Again we renum- 
ber the nodes such that nodes in the connected compo- 
nent take value from 1 to 2,439, and calculate the eigen- 
values and eigenvectors of the adjacency matrix corre- 
sponding to this largest connected network. From the 
spectrum NNSD and A3 statistics are calculated, and 
these two show similar GOE statistics as shown in Figl 
for r = 0.89. 

Fig. [5] plots eigenvalues (a), entropy (b) and IPR (c) 
as a function of eigen number. Entropy and IPR are 
calculated using Eq. ([5]) and (|6]) respectively. Out of 
2,439 eigenvalues, approximately 96 are degenerate with 
the value zero. It means that there are nearly 2343 
non-degenerate eigenvalues, which could be taken as the 
effective dimensionality of the network. According to 
RMT, Shannon entropy of a random vector of dimen- 
sion N = 2343 is ln(2343/2) ~ 7.0. On the other hand, 
RMT predicted value for Shannon entropy for degen- 
erate eigenvectors is ln(96/2) ~ 3.9. Based on these 
calculations, again we can divide eigenvalues in three 
sets (A), (B) and (C). Localized eigenvectors correspond- 
ing to non-degenerate part are: m^'^^(IPR=0.41), ^1635 
(IPR=0.3), u^^° and u^'^^ (with A = 1, 

IPR=:0.195 and 0.24) Localized eigenstates correspond- 



ing to zero eigenvalues (set (c)) are: 



,1269 



,1270 



(IPR=0.38), 



(IPR=0.37), 



(IPR=0.28). Significant contrib- 



utors in localized eigenvectors are written in Table |TT1 

The degree distribution of the largest component at 
this threshold follows a power law as well, revealing the 
scalcfree nature of this component. Increasing thresh- 
old preserves scalefree property of the network. Some 
nodes are hubs which carry the whole network and en- 
joy the structural importance. Again we find that the 
top contributing nodes are not the ones with very high 
degree. For two different threshold values Tables [T] and 
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FIG. 5: (Color online) Same as Fig.4 but for threshold value 
of 0.91. Open blue circles correspond to localized eigenvec- 
tors whose top five contributing nodes are presented in the 
Table HI] 

im show the largest contributing co-expressing genes in 
the corresponding localized eigenvectors. We find that 
choosing threshold is very important for the analysis of 
Gene co-expression networks, as we can see that top five 
largest contributing nodes differ entirely (except one) as 
threshold value is changed. This suggests that, though 
the gross structure of whole network (Fig. [1]) and scale- 
free property, remains unchanged, value of threshold has 
a strong effect on the network leading to entirely different 
sets (except few) of largest contributing nodes for two dif- 
ferent threshold values. Appendix enlists the genenames 
corresponding to the probesets identifiers as given in |T] 
and El 



V. CONCLUSIONS AND DISCUSSIONS 

Using RMT, we have analyzed gene co-expression net- 
work constructed by applying two different threshold val- 
ues to the data obtained from six brain regions that are 
metabolically relevant to Alzheimer's disease Q. The 
NNSD of adjacency matrix of the largest connecting com- 
ponent of the network follows universal GOE statistics 
(with /3 ^ 1). This universality adds one more fea- 
ture, based on the spectral correlations, to the gene co- 
expression network which is common with different model 
networks 0] proposed to capture various structural prop- 
erties of real world networks. 

The NNSD gives information about the short range 
correlations among the eigenvalues. To probe the long 
range correlations we have studied spectral rigidity via 
A3(L) statistics. This analysis shows that the gene co- 
expression network considered here follows RMT predic- 
tion of GOE for very long range of L. Beyond this value 



of L deviation in the spectral rigidity is seen, indicat- 
ing a possible breakdown of universality. This means the 
network under consideration has sufficient randomness 
which may due to robustness of the systems, with regular- 
ity which may be to perform some functional task. Mix- 
ture of random connections and regular structure have 
been emphasized at various places, for instance informa- 
tion processing in the brain is considered to be random 
connections among different modular structure (32| . 

Deviation from the universal RMT predictions identify 
system-specific, non-random properties of system under 
consideration, might provide clues about important inter- 
actions. To extract these system dependent information 
we have performed eigenvector analysis. This analysis 
reveals that there are some eigenvectors which are highly 
localized. The component / of a given eigenvector re- 
lates to the contribution of node (corresponding gene) I 
to that eigenvector. Hence, the distribution of the com- 
ponents contains information about the number of genes 
contributing to a specific eigenvector. Inverse partici- 
pation ratio IPR, as defined in Eq. (jH) , distinguishes 
between one eigenvector with approximately equal com- 
ponents and another with a small number of large com- 
ponents. According to the RMT predictions, the largest 
contributing nodes (genes) in the localized eigenvectors 
may have important function, or important functional 
relations among them. 

The largest connected component is scale-free indicat- 
ing the structural importance of few nodes (hubs) . Eigen- 
vector analysis shows that top contributing nodes in the 
localized eigenvectors have relatively low degrees. Note 
that genes which are hubs or those which connect differ- 
ent communities are also important, as shown by several 
earlier studies in the network framework 0, , but the 
aim of the present work is look for the important genes 
beyond these structural measures. Changing the value of 
threshold, while keeping the scale-free structure of net- 
work same, has drastic impact on the localization prop- 
erty of eigenvectors. All most all the top contributing 
nodes differ for two different threshold value, indicating 
impact on the global properties of the underlying net- 
work. 

Last, we discuss here the importance of the analysis 
and future implications of the results presented in the 
paper. Several studies have shown that the develop- 
ment of multi-target drugs might give better results than 
the traditional methods targeting a single protein. Sin- 
gle target-design might not always give satisfactory re- 
sults, as there might be a backup system, which replaces 
the function of the inhibited target protein. By using 
multi-target drugs one can decrease the functionality of 
entire protein cascades producing more effective results. 
For example, studies have shown that aging is strongly 
linked with age-related diseases, and they share a com- 
mon signaling network. Signaling hubs of the age-related 
protein-protein interaction subnetwork may be good can- 
didates for age-related drug-targets. Multi-target drugs 
attacking hubs of the protein-protein interaction net- 
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Probeset Gene name 



TABLE III: Genenames corresponding to the probesets for 
the threshold value 0.89 



work, 'hub-links' (links connecting hubs), bridges (inter- 



modular links having high 'betweenness ccntrality') or 
nodes in the overlap of numerous network modules, might 
give better results [33, 1131 • Similarly, targeting genes cor- 
responding to the largest contributing nodes in localized 
eigenvectors may lead to important effect as well. Future 
investigations are sought in order to know the function- 
ality of these genes corresponding to the top contributing 
nodes in the localized eigenvectors, which could be then 
used for such multi-target drug designs. 



Appendix 

Tables IIIII and IIVI correspond to probesets identifiers 
from tables H] and HIl respectively. First column of these 
tables are probeset identifiers (Affymetric ID) and second 
column dictates the corresponding genenames. However, 
the he function of some transcripts is not known yet, 
and some of them has no gene name. The value '-' in 
the gene name column indicates that information is not 
available. Note that there are many reasons for probesets 
without detailed annotation. We know the sequence on 
microarray for each probesets. On the chip we get all 
expressed genes, but we do not have secure info for all 
the gene functions. As the knowledge is growing with 
the latest available technologies, this gap is decreasing 
with time. One sure information for the probeset is the 
Affymetric ID as given in the table I and II [26| . 



202060_at 
227636_at 
202916s_at 
225921_at 
214351x_at 
217731s_at 
205003_at 
226832_at 
212635_at 
203034s_at 
201121s_at 
211940x_at 
209860s_at 
208645s_at 
200673_at 
221775x_at 
224616_at 
218175_at 
221511x_at 
221471_at 
229630s_at 
222203s_at 
221810_at 
231896s_at 
225950_at 



Ctr9, Pafl/RNA polymerase II 

family with sequence similarity 20, member B 
ninein (GSK3B interacting protein) 
ribosomal protein L13 
integral membrane protein 2B 
dedicator of cytokinesis 4 

transportin 1 
ribosomal protein L27a 
progesterone receptor membrane component 1 

annexin A7 
ribosomal protein S14 
lysosomal protein transmembrane 4 alpha 
ribosomal protein L22 
dynein, cytoplasmic 1 
coiled-coil domain containing 92 
cell cycle progression 1 
serine incorporator 3 
Wilms tumor 1 associated protein 
retinol dehydrogenase 14 
RAB15, member RAS onocogene family 
density-regulated protein 
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210338s_at 
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201121s_at 
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230416.at 

210418s_at 
224819.at 

208667s_at 
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209460.at 

223716s_at 
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238494.at 
38398_at 
226395.at 
224644.at 

211733x_at 
201494.at 
230416.at 

213347x_at 
201535.at 

200626s_at 
242317.at 

212788x_at 
212474.at 
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suppression of tumorigenicity 13 
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protein kinase C, zeta 
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zinc finger, RAN-binding domain 

DnaJ (Hsp40) homolog, subfamily C 

TNF receptor-associated factor 3 
MAP-kinase activating death domain 
hook homolog 3 (Drosophila) 

sterol carrier protein 2 
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AVL9 homolog (S. cerevisiase) 
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